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Abstract 

In this paper we consider so-called Google matrices and show that all eigenvalues (A) 
of them have a fundamental property |A| < 1. The stochastic eigenvector corresponding 
to A = 1 called the PageRank vector plays a central role in the Google’s software. We 
study it in detail and present some important problems. 

The purpose of the paper is to make the heart of Google clearer for undergrad¬ 
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1 Introduction 


Google is one of important tools to analyze Modern Society. In this paper we want to explain 
a secret of Google, which is “the heart of Google’s software”, to undergraduates. 

Although we are not experts of IT (Information Technology) the secret is clearly expressed 
in terms of Linear Algebra in Mathematics. However, it is almost impossible to solve the 
linear algebra version explicitly, so we need some approximate method. 

First, we give a fundamental lemma to understand a Google matrix (see the definition 
in the text) and present an important problem to define a realistic Google matrix (in our 
terminology). The problem is a challenging one for young researchers. For such a matrix we 
can use the power method to obtain the PageRank vector. 

Second, we pick up an interesting example in [TJ and calculate it thoroughly by use 
of MATHEMATICA. A good example and a thorough calculation help undergraduates to 
understand. 

Last, we show an example which does not give the PageRank vector in terms of the power 
method with usual initial vector when H is not a realistic Google matrix. For this case we 
treat the power method with another initial vector and present a general problem. 

We expect that undergraduates will cry out “I got Google !” after reading the paper. 

2 Main Result 

We introduce a Google matrix (realistic Google matrix) and study its key property. 

We consider a collection of web pages with links (for example, a homepage and some 
homepages cited in it). See the figure in the next section (eight web pages with several 
links). 

If a page has k links we give the equal weight £ to each link and construct a column 
vector consisting of these weights. See the figure once more. For example, since the page 4 
links to the pages 2, 5 and 6 (three links) each weight is |. Therefore we obtain the column 
vector like 
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page 4 
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0 
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3 
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< 5 

< 6 


VV 

As a result, the collection of web pages gives a square matrix 


H={H ij y, Hij > 0, Y^ H a = 1 (2.1) 

i 

which we will call a Google matrix. Note that Hu = 0 for all % (we prohibit the self-citation). 
From the definition it is a sparse matrix because the number of links starting from a webpage 
is in general small compared to the number of webpages. 

If we set 

J=(1,1,---,1) T 

where T is the transpose (of a vector or a matrix) then it is easy to see 


H T J= J 


( 2 . 2 ) 


because row vectors of H T are the transpose of column vectors of H like 

page 4 —> (o, 1,0,0, 1,1,0,°). 

From this we know that 1 is an eigenvalue of H T . By the way, the eigenvalues of H are equal 
to those of H T because 

0 = \XE-H\ = \\E-H t \ 


, so we conclude that 1 is just an eigenvalue of H. 
Therefore, we have the equation 

HI = I 


(2.3) 
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where we assume that the eigenvector I is stochastic (the sum of all entries is 1). This / is 
called the PageRank vector and plays a central role in Google. 

Now, we give a fundamental lemma to Google matrices : 

Lemma Let A be any eigenvalue of a Google matrix H. Then we have 


|A| < 1. (2.4) 

The proof is easy and is derived from the Gerschgorin’s (circle) theorem j2]. Note that 
the eigenvalues of H are equal to those of H T and the sum of all entries of each row is 1 (see 
for example (13.21) ). Namely, 

n n 

'52(H T ) ij = '52 H ji = 1 and (H T ) ii = H ii = 0 (2.5) 

j =1 3 =1 

for all i and j. 

We are in a position to state the Gerschgorin’s theorem. Let A = () be a n x n complex 
(real in our case) matrix, and we set 

n 

Ri ^ ^ | &ij | 

3= 1 > 

and 

R^;) G C | |Z CLa\ ^ R) } 

for each i. This is a closed disc centered at an with radius R{ called the Gerschgorin’s disc. 
Theorem (Gerschgorin) For any eigenvalue A of A we have 

n 

A G \^j D(an] Ri). (2.6) 

i— 1 

The proof is simple. Let us consider the equation 

ALx = Ax (x 7 ^ 0 ) (2.7) 


and \xi\ be the maximum 


ki| = max{|xi|, \x 2 \, ■■■ , |x n |} ^ 0 


— < 1 - 

Xi 
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From (12.7p we have 


^ ^ Ax t '' ^ ^ Ax^ dii^i (A U//)X ? ; ■ 

i =1 i=i, jA* 


Xj 7 ^ 0 gives 


j=i, jA* 


n 




A _ \ ^ dj j 

a da y ^ dij 

r- 

and we have 

n n 

IA — aii | = | Y %J|< Y Y K\\^;\< Y 

j=l, j^i 1 j= 1, j^i 1 j=1, j^i 1 j= 1, j^i 

This means A € D(au ; i?j) for some z and completes the proof. 

Finally, let us complete our lemma. In our case Hij > 0, Hu = 0 and R, = 1 for all z and 
j, so these give the result 

|A|<1 


for any eigenvalue A of H. This is indeed a fundamental property of Google matrices. 


A comment is in order. The lemma must have been known. However, we could not find 
such a reference within our efforts. 

Let us go ahead. In order to construct the eigenvector / in (12.3[) a method called the 
power method is very convenient for a sparse matrix H of huge size. To calculate the 
characteristic polynomial is actually impossible. 

The method is very simple, pQ. A sequence { I n } is defined recurrently by 


I n HI n —\ and Iq Gi ( 2 . 8 ) 

where the initial vector is ej = (1, 0, • • • , 0) T , which is usually standard. This is also rewrit¬ 
ten as 

In = H n I 0 = H n ei . 

If {I n } converges to / then we obtain the equation (12.3(1 like 


HI = H( lim I n ) = lim HI n = lim J n+1 = I. 

n—>• oo n—>oo n—>oo 
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In order that the power method works correctly some assumption on H is required. 
Namely, 

(Q) For a set of eigenvalues {Ai = 1, A 2 , • • • , A n } we assume 

Ai = l> |A 2 | >•••> |A n |. (2.9) 

Note that 1 is a simple root. The assumption may be strong. 

If a Google matrix H satisfies (12.9[) we call H a realistic Google matrix. Now, let us 
present an important 

Problem For a huge sparse matrix H propose a method to find or to estimate the second 
eigenvalue A 2 without calculating the characteristic polynomial. 

As far as we know such a method has not been given in Mathematical Physics or in 
Quantum Mechanics. This is a challenging problem for mathematical physicists. 

3 Example 

We consider an interesting example given in [I] and calculate it thoroughly by use of MATH- 
EMATICA. A good example helps undergraduates to understand a model deeply. 

In this section we need some results from Linear Algebra, so see for example [5] or [3] 
(we don’t know a standard textbook of Linear Algebra in Europe or America or etc). 
Example : a collection of web pages with linkj)] 



1 It is not easy for us to draw a (free) curve by use of the free soft WinTpic. 
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The Google matrix for this graph is given by 


and its transpose is 


^ 000000 | 0 ^ 

ttOtt^OOOO 


H = 


2 3 

0 0 0 0 0 0 0 


0 1 0 0 0 0 0 0 

0 


0 o I § o 0 I 
0 0 0 I I 0 0 


0000 ± 00 ± 


0 0 0 0 I 1 I 0 


H 1 = 


V 

If we define a stochastic vector 

J = 


0 | | 0 0 0 0 0 
0 0 0 1 0 0 0 0 
o|oo±ooo 
OiOOiiQO 


00000 ||i 
0 0 0 0 0 0 0 1 
fooofoof 
00000±±0 


11111111 

o’ o’ o’ o’ o’ o’ o 


\ 


(3.1) 


(3.2) 


it is easy to see 

H t J = ,J. (3.3) 

Let us study H from the mathematical view point by use of MATHEMATICA. The 
characteristic polynomial of H is given by 
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/(A) = \XE-H\ 

AOOOOO—10 
A -§ 0 0 0 0 

—|OAOOOOO 
0—lOAOOOO 
0 0 —| | A 0 -A 0 

0 0 ° -§ A 0 -i 

0 0 0 0 0 A 

0 0 0 0 —| —1 —| A 

= A(A — 1) fA 6 + A 5 — -A 4 — -A 3 + —^—A 2 + -^-A + — ^ . (3.4) 

v ’ \ 9 6 108 216 72y v ; 

The exact solutions are {Ai = 1, A§ = 0} and approximate ones (we round off a real number 

to five decimal places like —0.87021 • ■ • = —0.8702) are given by 

A 2 = -0.8702, A 3 = -0.5568, 

A 4 = 0.4251 - 0.2914?, A 5 = 0.4251 + 0.2914?, 

A 6 = -0.2116-0.2512?, A 7 =-0.2116 + 0.2512?. 

From these we have 

Ai = 1 > |A 2 | > |A 3 1 > |A 4 | = |A 5 1 > |A 6 1 = |A 7 I > A§ = 0. (3-5) 

H becomes a realistic Google matrix from (12.91) . 

Moreover, the eigenvector for A 4 = 1 is given by 

/ = (24, 27,12, 27, 39, 81, 72,118) T . 


To check this (by hand) is not difficult and good exercise for undergraduates. Since the sum 
of all entries of I is 400 the stochastic eigenvector (= the PageRank vector) / becomes 






/ = 


' 24 ' 

400 


' 0.06 

27 

400 


0.0675 

12 

400 


0.03 

27 

400 


0.0675 

39 

400 


0.0975 

81 

400 


0.2025 

72 

400 


0.18 

118 

l /inn i 


1 0.295 ; 


(3.6) 


As a result, the ranking of webpages becomes 




(3.7) 


See the figure once more. 

Here, let us show the power method to obtain the PageRank vector /, which is very 
useful if a realistic Google matrix is huge. A sequence {/„ } is defined as 

In = HI n —i and I 0 = (1, 0 , 0 , 0 , 0 , 0 , 0 , 0 ) r 


or 

I n = H H I 0 . 

If the condition | A 21 < 1 holds then we have 

lim I n = I 

n—>00 

because H can be diagonalized to be 

H = 5diag(l, A 2 , • • • , Ag )^" 1 =► H n = 5 , diag(l, A” • • • , A^A " 1 
with a matrix S consisting of eigenvectors. The speed of convergence depends on |A 2 
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Let us list the calculation (rule : a real number is rounded off to five decimal places) : 


^40 — 


^ 0.0601 ^ 


0.0675 

0.0299 

0.0676 

0.0976 

0.2022 

0.1797 

0.2954 


7l5 — 


^ 0.0600 ^ 


0.0675 

0.0300 

0.0675 

0.0975 

0.2024 

0.1800 

0.2951 


ho — 


( 0.0600 ^ 


0.0675 

0.0300 

0.0675 

0.0975 

0.2024 

0.1799 

0.2951 


hb — 


( 0.0600 ^ 


0.0675 

0.0300 

0.0675 

0.0975 

0.2025 

0.1800 

0.2950 


= /. (3.8) 


The result must be related to the powers of |A 2 1 = 0.87 like 


(0.87) 40 = 0.0038, (0.87) 45 = 0.0019, (0.87) 5U = 0.0009, (0.87) 55 = 0.0005. (3.9) 


\45 


50 


\55 


Problem Clarify a relation between I n and (0.87) n . 


4 Counter Example 

We show an example which does not give the PageRank vector in terms of the power method 
with usual initial vector ei when H is not a realistic Google matrix. 

Example : a collection of web pages with links 



The Google matrix for this graph is given by 

( 0 \ 0 0 ^ 

1 0 ± 0 

H= 2 

0 | 0 1 

v 0 0 I 0 y 
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The characteristic polynomial of H is given by 


/(A) = |A£ - H\ = A 4 - |a 2 + i = (A 2 - 1)(A 2 - i) 


and the solutions are 


A = ±i, ±-. 


(4.2) 


(4,3) 


Therefore, H is not a realistic Google matrix because of A = —1. See (12. 9 p once more. For 
H it is easy to see that the PageRank vector is given by 


/ 1 ^ ( 0.1667 ^ 


/ = 


V 6 / 


V 


0.3333 

0.3333 

0.1667 


(4.4) 




We show that I is not obtained by the power method. In fact, it is easy to see 


/ \ 


h„ = H 2n e i = 


0 N 


and /• 


2n+l 


= H 2n+1 e 1 = 


0 ) 


(4.5) 


y d n j 


where we don’t need exact values of a n , b n , c n , d n . As a result, {/„} does not converge. 
Next, as a trial we change the initial vector. For example we set 

/ i \ 

4 


= H n Jn and J n = 


(4.6) 


V 4 1 


because of H T Jo = Jn. Let us list the calculation : 


■ho — 


0.1667 

0.3333 

0.3333 

0.1667 


Jn — 


^ 0.1666 ^ 


0.3334 

0.3334 

0.1666 


J1 2 ~ 


/ 


^ 0.1667 ^ 


0.3333 

0.3333 

0.1667 
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(4.7) 


















From the result n = 10 is enough. 

Last, we present an important 

Problem We speculate that J 0 = (1/n, 1/n, ■ ■ ■ ,l/n) T is in general better than ex = 

(1, 0, • • • , 0) T as an initial vector. Study this point in detail. 
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